Predicting Document Coverage for Relation Extraction
نویسندگان
چکیده
Abstract This paper presents a new task of predicting the coverage text document for relation extraction (RE): Does contain many relational tuples given entity? Coverage predictions are useful in selecting best documents knowledge base construction with large input corpora. To study this problem, we present dataset 31,366 diverse 520 entities. We analyze correlation features like length, entity mention frequency, Alexa rank, language complexity, and information retrieval scores. Each these has only moderate predictive power. employ methods combining statistical models TF-IDF BERT. The model BERT, HERB, achieves an F1 score up to 46%. demonstrate utility on two use cases: KB claim refutation.
منابع مشابه
Multi-Document Summarisation Using Generic Relation Extraction
Experiments are reported that investigate the effect of various source document representations on the accuracy of the sentence extraction phase of a multidocument summarisation task. A novel representation is introduced based on generic relation extraction (GRE), which aims to build systems for relation identification and characterisation that can be transferred across domains and tasks withou...
متن کاملSingle-Document Keyphrase Extraction for Multi-Document Keyphrase Extraction
Here, we address the task of assigning relevant terms to thematically and semantically related sub-corpora and achieve superior results compared to the baseline performance. Our results suggest that more reliable sets of keyphrases can be assigned to the semantically and thematically related subsets of some corpora if the automatically determined sets of keyphrases for the individual documents ...
متن کاملCollective Cross-Document Relation Extraction Without Labelled Data
We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). F...
متن کاملGlobal Relation Embedding for Relation Extraction
Recent studies have shown that embedding textual relations using deep neural networks greatly helps relation extraction. However, many existing studies rely on supervised learning; their performance is dramatically limited by the availability of training data. In this work, we generalize textual relation embedding to the distant supervision setting, where much largerscale but noisy training dat...
متن کاملLarge-Coverage Root Lexicon Extraction for Hindi
This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2022
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00456